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1 . INTRODUCTION 

The  status  of  cont.i  nuous-time  stochastic  control  theory  ten  years  ago  is  ad- 
mirably summarized  in  Fleming's  1969  survey  paper  [40].  The  main  results,  of  which 
a very  brief  outline  will  be  found  in  §2  below  and  a complete  account  in  the  book 
[41],  concern  control  of  completely-observable  diffusion  processes,  i.e.  solutions 

Of  stochastic  differential  equations/  Formal  application  of  Bellman's  "dynamic 
programming"  idea  quickly  leads  to  the  "Bellman  equation"  (2.3),  a quasi-linear  para- 
bolic equation  whose  solution,  if  it  exists,  is  easily  shown  to  be  the  value  func- 
tion for  the  control  problem.  At  this  point  the  probabilistic  aspects  of  the  pro- 
blem are  finished  and  all  the  remaining  work  goes  into  finding  conditions  under 
which  the  Bellman  equation  has  a solution.  The  reason  why  dynamic  programming  is  a 
fruitful  approach  in  stochastic  control  is  precisely  that  these  conditions  are  so 
much  weaker  than  those  required  in  the  deterministic  case.  As  regards  problems 
with  partial  observation  the  best  result  was  Wonham's  formulation  of  the  "separation 
theorem"  [78]  which  he  proved  by  reformulating  the  problem  as  one  of  complete  ob- 
servations, v/ith  the  "state"  being  the  conditional  mean  estimate  produced  by  the 
Kalman  filter;  see  §6  below. 


* Work  supported  by  the  U.S.  Air  Force  Office  of  Sponsored  Research  under 
Grant  AFOSR  77-3281  and  by  the  Department  of  Energy  under  Contract  EX-76-A-01-22')5. 


The  dynamic  programed ng  approach,  while  successful  in  many  applications,  suf- 
fers from  many  limitations.  An  immediate  one  is  that  the  controls  have  to  be  smooth 
functions  of  the  state  in  order  that  the  resulting  stochastic  differential  equation 
(2.1)  have  a solution  in  the  Ito  sense.  This  rules  out,  for  example,  "bang-bang" 
controls  which  arise  naturally  in  some  applications  (e.g.  [3]).  Thus  a weaker  for- 
mulation of  the  solution  concept  seems  essential  for  stochastic  control;  this  was 
provided  by  Stroock  and  Varadhan  [71]  for  Markov  processes  and  by  various  forms  of 
measure  transformations,  beginning  with  the  Girsanov  Theorem  [43],  for  more  general 
stochastic  systems;  these  are  outlined  in  §3.  But  even  with  the  availability  of 
weak  solution  concepts  it  seems  that  the  Bellman  equation  approach  is  essentially 
limited  to  Markovian  systems  and  that  no  general  formulation  of  problems  with 
partial  observations  is  possible  (A  Bellman  equation  for  partially  observed  diffus- 
ions was  formally  derived  by  Mortensen  [65],  but  just  looking  at  it  convinces  one 
that  some  other  approach  must  be  tried) . 

Since  1969  a variety  of  different  approaches  to  stochastic  control  have  been 
investigated,  among  them  the  following  (a  very  partial  list) . Krylov  [51]  has  stud- 
ied generalized  solutions  of  the  Bellman  equation;  methods  based  on  potential  theory 
[5]  and  on  convex  analysis  [7]  have  been  introduced  by  Bismut;  necessary  conditions 
for  optimality  using  general  extremal  theory  have  been  obtained  [44]  by  Haussmann;  a 
reformulation  of  dynamic  programming  in  terms  of  nonlinear  semigroups  has  been  given 
by  Nisio  [66];  variational  inequality  techniques  have  been  introduced  by  Bensoussan 
and  Lions  [4],  and  computational  methods  systematically  developed  by  Kushner  [54]. 

This  survey  outlines  the  so-called  "martingale  approach"  to  stochastic  control. 
It  is  based  on  the  idea  of  formulating  Bellman's  "principle  of  optimality"  as  a 
ev.bmavtiv.gale  inequality  and  then  using  Meyer’s  submartingale  decomposition  [63]  to 
obtain  local  conditions  for  optimality.  This  is  probably  the  most  general  form  of 
dynamic  programming  and  applies  to  a very  general  class  of  controlled  processes,  as 
outlined  in  §5  below.  However,  more  specific  results  can  be  obtained  when  more 
structure  is  introduced,  and  for  this  reason  we  treat  in  some  detail  in  §§4,6  the 
case  of  stochastic  differential  equations,,  for  which  the  best  results  so  far  are 
available.  Other  specific  cases  are  outlined  in  §7. 

I have  attempted  to  compile,  in  §9,  a fairly  complete  list  of  references  on 
this  topic  and  related  subjects  . Undoubtedly  this  list  will  suffer  from  important 
omissions,  but  readers  have  my  assurance  that  none  of  these  is  intentional.  It 
should  also  be  mentioned  that  no  systematic  coverage  of  martingale  representation 
theorems  has  been  attempted,  although  they  are  obviously  germane  to  the  subject. 


2.  CONTROL  OF  DIFFUSION  PROCESSES 

To  introduce  the  connection  between  dynamic  programming  and  submartingales,  let 
us  consider  a control  problem  where  the  n-dimensional  state  process  x^  satisfies  the 
Ito  stochastic  differential  equation 

(2.1)  dxfc  = f(t,  xfc,  ut)dt  + 0(t,  *t)dwt 
x0  = ce  Rn 

Here  wfc  is  an  n-dimensional  Brownian  motion  and  the  components  of  f and  O are  C1 

functions  of  x,  u,  with  bounded  derivatives.  The  control  ufc  is  a feedback  of  the 

current  state,  i.e.  ufc  = u(t,  xfc)  for  some  given  function  u(t,  x)  taking  values  in 

the  control  set  U.  ifu  is  Lipschitz  in  x,  then  (2.1)  is  a stochastic  differential 

equation  satisfying  the  standard  Ito  conditions  and  hence  has  a unique  strong  solution 

x . The  cost  associated  with  u is  then 
rT 


c(t,  X , u ) dt  + $(x  )] 
t t T 


where  T is  a fixed  terminal  time  and  c,  $ are,  say,  bounded  measurable  functions. 

The  objective  is  to  choose  the  function  u(*,*)  so  as  to  minimize  J(u).  An  extensive 
treatment  of  this  kind  of  problem  will  be  found  in  Fleming  and  Rishel’s  book  [41  ). 

’ Introduce  the  I’alue  function 

(2.2)  V (t,  x)  = inf  E(tf  x)  r c(s,  xg,  ug)ds  + $(xT) ] 


Here  the  subscript  (t,  x)  indicates  that  the  process  xg  starts  at  xfc  = x,  and  the 
infimum  is  over  all  control  functions  restricted  to  the  interval  [t,  T]  . Formal  ap- 
plication of  Bellman's  "principle  of  optimality"  together  with  the  differential  for- 
mula suggests  that  V should  satisfy  the  Bellman  equation : 

(2.3)  V + 1/2  (oa'). . V + min  [V’  f(t,  x,  u)  + c(t,x,u)]  = 0 

t . . 1]  x.x.  ueu  x 


V (T,  x)  = $ (x) , xcR 


(t,  x)  6 [0,  T[  x R 


(Vfc  = 3v/3t  etc.,  and  Vfc,  etc.  are  evaluated  at  (t,  x)  in  (2.3)).  There  is  a 
"verification  theorem”  [41  ,§  VI  41  which  states  that  if  V is  a solution  of  (2.3), 

(2.4)  and  u°  is  an  admissible  control  with  the  property  that 

V^(t,x)  f (t,x,u° (t,x) ) + c(t,x,u° (t,x) ) = m^Q  [V^(t,x)  f(t,x,u)  + c(t,x,u)] 

then  u°  is  optimal.  Conditions  under  which  a solution  of  (2.3),  (2.4)  is  guaranteed 
will  be  found  in  [41  ,§  VI  6].  Notable  among  them  is  the  unifoivn  ellipticity  condi- 
tion: there  exists  <>0  such  that 


(2.5)  £ (oo')  ^ f.  > <|  tj‘ 


for  all  £CR  . This  essentially  says  that  noise  enters  every  component  of  equation 
(2.1),  whatever  the  coordinate  system. 


• Lot  us  reformulate:  these  results  in  martingale  terms,  supposing  the  conditions 
are  such  that  (2.3),  (2.4)  has  a solution  with  suitable  growth  properties  (see  below). 
For  any  admissible  control  function  u and  corresponding  trajectory  xfc  define  a process 
as  follows: 


(2.6)  = f c(s,  x , u )ds  + V (t,  x.) 

t s s t 


Note  that  m“  is  the  minimum  expected  total  cost  given  the  evolution  of  the  process 
up  to  time  t.  Expanding  the  function  V(t,  xfc)  by  the  Ito  rule  gives 

(2.7)  M“  = V(0 ,1  ) +f  [V  + 1/2  53(00'  )..  Vx.x .+  V'  fU  + cjds  +f  V adw 

t Jo  t lj  13  1 3 x JQ  x 

where  fU(t,x)  = f (t,  x,  u(t,  x) ) . But  note  from  (2.3)  that  the  integrand  in  the 
second  term  of  (2.7)  is  always  non-negative.  Thus  this  term  is  an  inareasing  process. 
If  u is  optimal  then  the  integrand  is  identically  zero.  Assuming  that  the  function 
V is  such  that  the  last  term  is  a martingale,  we  thus  have  the  following  result: 

(2.8)  For  any  admissible  u,  is  a submartingale  and  u is  optimal  if  and  only 
if  is  a martingale. 

The  intuitive  meaning  of  the  submartingale  inequality  is  clear:  the  difference 

E[m“  lx  , r<s]  - m" 
t 1 r > — s 

is  simply  the  expected  cost  occasioned  by  persisting  in  using  the  non-optimal  control 
over  the  time  interval  [s,  t]  rather  than  switching  to  an  optimal  control  at  time  s. 
The  other  noteworthy  feature  of  this  formulation  is  that  an  optimal  control  is  con- 
structed by  minimizing  the  Hamiltonian 

H ( t , X , , u ) = f(t,x,u)  + c(t,x,u) 

and,  conveniently,  the  "adjoint  variable"  is  precisely  the  function  that  appears 
in  the  integrand  of  the  stochastic  integral  term  in  (2.7). 

Abstracted  from  the  above  problem,  the  "martingale  approach"  to  stochastic  con- 
trol of  systems  with  complete  observations  (i.e.  where  the  controller  has  exact 
knowledge  of  the  past  evolution  of  the  controlled  process)  consists  of  the  following 


steps: 


1.  Define  the  value  function  Vfc  and  conditional  minimal  cost  processes  as 
in  (2.2),  (2.6) 

2.  Show  that  the  "principle  of  optimality"  holds  in  the  form  (2.8) 

3.  Construct  as  optimal  policy  by  minimizing  a Hamiltonian,  where  the  adjoint 
variable  is  obtained  from  the  integrand  in  a stochastic  integral  represen- 
tation of  the  martingale  component  in  the  decomposition  of  the  submartinguli 


In  evaluating  the  cost  corresponding  to  a control  policy  u in  the  above  problem, 
all  that  is  required  is  the  sample  space  measure  induced  by  the  xfc  process  with 


I 


control  u.  It  is  also  convenient  to  note  that  the  cost  can  always  be  regarded  as  a 
terminal  cost  by  introducing  an  extra  state  variable  x®  defined  by 

(2.9)  dx°  = c(t,  xt,  ufc)dt  + dw® 


where  w®  is  an  additional  Brownian  motion,  independent  of  wfc.  Then  since  E w®  = 0 
we  have 

(2.10)  J(u)  = E (x®  + 3>(xt)]  = E [$(x®,  xT)] 

Let  C denote  the  space  of  Rn+^~  valued  continuous  functions  on  [0,  T]  and  (F^)  the 
increasing  family  of  a-fields  generated  by  the  coordinate  functions  {xfc)  in  C.  Since 

(2.1),  (2.9)  define  a process  (x®,  xfc)  with  a.s.  continuous  sample  functions,  this 
induces  a measure,  say  y^,  on  (C,  F^)  and  the  cost  can  be  expressed  as 

j(u)  = f $(x“,  xT)  yu(<3x) 

*/C 

It  turns  out  that  each  y is  absolutely  continuous  with  respect  to  the  measure  y 
induced  by  (x°,  x^)  with  f = c = 0.  Thus  in  its  abstract  form  the  control  problem 
has  the  following  ingredients: 

(i)  A probability  space  (fi,  F^,  y) 

(ii)  A family  of  measures  (y^,  u eU)  absolutely  continuous  with  respect  to 

(or,  equivalently,  a family  of  positive  random  variables  (2^)  such  that 

E 2 =1  for  each  uGU) 

u 

(iii)  An  F^-measurable  random  variable  $ 

The  problem  is  then  to  choose  uGU  so  as  to  minimize  E $ = E[l  0].  In  many  cases  it 

u u 

is  possible  to  specify  the  Radon-Nikodym  derivative  2^  directly  in  order  to  achieve 
the  appropriate  sample-space  measure.  We  outline  this  idea  in  the  next  section  before 
returning  to  control  problems  in  section  4. 


3.  ABSOLUTELY  CONTINUOUS  TRANSFORMATION  OF  MEASURES 

Let  (0,  F,  P)  be  a probability  space  and ( Ffc ) Q< fc< ^ be  an  ^ncreas^n9  family  of 
sub-a-fields  of  F such  that 

(i)  Each  F is  completed  with  all  null  sets  of  F 

,,  . . (ii)  (F  ) is  right-continuous:  F^  = Q F 

(3.1)  t t s>t  s 

(iii)  F is  the  completion  of  the  trivial  CT-field  {0,  Q}. 

o 

(iv)  F1  = F 

Suppose  is  a probability  measure  such  that  P^<<P.  Define 

(3.2)  L = dP^/dP 
and 

(3.3)  Lfc  = E [L1|FtJ 


3 

1 


I 

I 

I 


r 

I 


J 


1 a.  s. 


in  view  of  (3.1)  (iii)  . 


'Then  is  a positive  martingale,  EL^  = 1,  and  = 

According  to  [63  , VI  T4]  there  is  a modification  of  (L^ ) whose  paths  are  right- 


continuous  with  left  hand  limits  (we  denote  L 


t- 


L ). 
s 


Define 


T = 1 * inf  {t:  Lt  ~ L = 0} 

T = 1 ~ inf{t:  L < 1/n  } 
n t 

Then  T t , T <T  and  Meyer  shows  in  [64  , VI  ] that  L. (w) 
n n—  t 


0 for  all  t > T (w) , a.s. 


Suppose  (Xt>  is  a given  non-negative  local  martingale  of  (F^)  with  Xq=1  a.s. 
Then  Xfc  is  always  a supermartingale,  since,  if  s^  is  an  increasing  sequence  of 
localizing  times  and  s<t,  using  Fatou's  lemma  we  have: 


x = lim  X = lim  E [X  F ] > E[lim  inf  X^  F ] = EfX^  F ] 
s n s~s  n t~s  1 s — n t~s  1 s t~s  1 s 

n n n n 

It  follows  that  EXfc  <_  1 for  all  t and  Xfc  is  a martingale  if  and  only  if  EX^  = 1. 

This  is  relevant  below  because  we  will  want  to  use  (3.2),  (3.3)  to  define  a measure 

from  a given  process  Lfc  which,  however,  is  a priori  only  known  to  be  a local 

martingale. 


Let  (Mfc)  be  a local  martingale  of  (F^)  and  consider  the  equation 


(3.4) 


Lfc=  ! 


* A 


dM 
s-  s 


It  "was  shown  by  Dollans-Dade  [28  ] (see  also  [64  , iv  25] , that  there  is  a unique  local 
martingale  (L  ) satisfying  this,  and  that  L is  given  explicitly  by 

-Am 

Lfc  = exp  (M  - 1/2  <M°,  M°>t)  ^ (1  + AM^)  e S 


Here  M^  is  the  "continuous  part"  of  the  local  martingale  M (see  [64,  iv  9]  and  the 
countable  product  is  a.s.  absolutely  convergent.  We  denote  Lfc  =£(M)t  (the  "Dol^ans- 
Dade  exponential" ) . 

Suppose  Am  > -1  for  all  (SfCO) . Then  L is  a non-negative  local  martingale,  and 
s — t 

hence  according  to  the  remarks  above  is  a martingale  if  and  only  if  EL^  = 1.  Its 
utility  in  connection  with  measure  transformation  lies  in  the  following  result,  due 
to  van  Schuppen  and  Wong  [69  ] . 


(3.5)  Suppose  EL^  = 1 and  define  a measure  P^  on  (SI,  F^)  by  (3.2).  Let  x be  a 

local  martingale  such  that  the  cross-variation  process  <x,  M>  exists.  Then 
X^.  : = Xt  “ <x/  M>t  ts  a pu  local  martingale. 

Note  that  from  the  general  formula  connecting  Radon-Nikodym  derivatives  and 
conditional  expectations  we  have 


(3.6) 


s 

A A 

and  consequently  Xfc  is  a P^-local  martingale  if  and  only  if  XtLfc  is  a P-local  martinga.li 
One  readily  verifies  that  this  is  so  with  Xfc  defined  as  above,  using  the  general 
change  of  variables  formula  for  semimartingales  [64  , IV  21] . 

Conditions  for  the  existence  of  <X , M>  are  given  by  Yoeurp  [79  ].  Recall  that 


the  "square  brackets"  process  [x,  M]  is  defined  for  any  pair  of  local  martingale: 


x,  M by 


[X,  M]  = <X 


“ Ax  Am 
s<t  s s 


Yoeurp  defines  <X,  M>  as  the  dual  predictable  projection  (in  the  sense  of  Dellachcrie 

[ 27])  of  [X,  m] , when  this  exists  and  gives  conditions  for  this  [ 79,  Thm.  1.12]. 

(This  definition  coincides  with  the  usual  one  [ 52  ] when  x and  M are  locally  square 

integrable.)  In  fact  a predictable  process  A such  that  X-A  is  a P^-local  martingale 

exists  only  when  these  conditions  are  satisfied  (see  also  [ 64  , VI  22] ) . 

An  exhaustive  study  of  conditions  under  which  EE  (M)  = 1 is  given  by  Lepingle 

and  Memin  in  [ 57  ] . A typical  condition  is  that  Am  > -1  and 

-Am 

(3.7)  E [exp  (1/2  <MC,  MC>1  ^ (1  + AMfc)  exp (y^—)  1 < 00 


This  generalizes  an  earlier  condition  for  the  continuous  case  given  by  Novikov 
[ 67 ] . We  will  mention  more  specific  results  for  special  cases  below;  see  also 
references  [2] , [3] , [12],  [13],  [30],  [36],  [43],  [56],  [60],  [77]. 

Let  us  now  specialize  the  case  where  x is  a Brownian  motion  with  respect  to 
the  a-fields  F , and  M is  a stochastic  integral 

rt 


M = f cf>  dX 
t ./  s s 


where  $ is  an  adapted  process  satisfying 


a.s.  for  each  t 


Then  <M 


f t 2 

I <{>  ds  < 00  a.s.  for  each  t 

s t t 
',  MC>t  = <M,  M>t  = J ds  and  <hj,  x>fc  = J 

L = exp  {f  4>  dX  - 1/2  f 4>  2 ds) 

s s J cs  s 


ds  so  that 


<(>  2 ds) 
s 


(3.10) 


-f 


in  a P^-local  martingale  (assuming  EL^  = 1).  Since  xt  has  continuous  paths,  <x»x  >t 
is  the  sample  path  quadratic  variation  of  xt  [52]  and  this  is  invariant  under  abso- 
lutely continuous  change  of  measure.  It  follows  from  (3.10),  since  the  last  term 
is  a continuous  process  of  bounded  variation,  that 

<B,  B>t  U = <X,  X>t(P)  = t 

and  hence  that  II  is  a P -Brownian  notion,  in  view  of  the  Kunita-Watanabe  character i- 
t u J 

zation  [ 64,  III  102].  This  is  the  original  "Girsanov  theorem"  [43].  A full  account 
of  it  will  be  found  in  Chapter  6 of  Liptser  and  Shiryaev’s  book  [ 60 ].  In  particular, 
theorem  6.1  of  [ 60]  gives  Novikov's  condition:  ET..^  = 1 if  <[>  satisfies  (3.7)  and 


(3.11)  E exp (1/2 


A’ 


ds)  < 00 


The  Girsanov  theorem  is  used  to  define  "weak  solutions"  in  stochastic  differential 
equations.  Suppose  f : [o,  1]  * C + R is  a bounded  non-ant icipative  functional  on 

the  space  of  continuous  functions  and  define 


<t>  (t,  u)  = f (t,  x(*  ,u) ) 

where  xfc  is  a P-Brownian  motion  as  above.  Then  (3.11)  certainly  holds  and  from  (3.10) 

we  see  that  under  measure  P the  process  x.  satisfies 

u t 

(3.12)  dxfc  = f(t,  x) dt  + dBfc 

where  B is  a P -Brownian  motion,  i.e.  (x  , F , P ) is  a "weak  solution"  of  the  sto- 
t u t t u 

chastic  differential  equation  (3.12).  (It  is  not  a "strong"  or  "Ito"  solution  since 
B does  not  necessarily  generate  x;  a well-known  example  of  Tsyrelson  [ 72 ] , [ 60  , 
§4.4.8]  shows  that  this  is  possible).  The  reader  is  referred  to  [60]  for  a compre- 
hensive discussion  of  weak  and  strong  solutions,  etc.  Suffice  it  to  say  that  the 
main  advantage  of  the  weak  solution  concept  for  control  theory  is  that  there  is  no 
requirement  that  the  dependence  of  f on  x in  (3.12)  be  smooth  (e.g. , Lipshitz  as  the 
standard  Ito  conditions  require),  so  that  such  things  as  "bang-bang"  controls  [3  ] , 

[ 21]  fit  naturally  into  this  framework. 

4.  CONTROLLED  STOCHASTIC  DIFFERENTIAL  EQUATIONS  - COMPLETE  OBSERVATIONS  CASE 


This  problem,  a generalization  of  that  considered  in  §2,  is  the  one  for  which 
the  martingale  approach  has  reached  its  most  definitive  form,  and  it  seems  worth 
giving  a self-contained  outline  immediately  rather  than  attempting  to  deduce  the  re- 
sults as  special  cases  of  the  general  framework  considered  in  §5.  The  results  below 
were  obtained  in  a series  of  papers:  Rishel  [681,  BeneS  [2],  Duncan  and  Varaiya  P0  ] , 
Davis  and  Varaiya  [25],  Davis  0.6],  and  Elliott  [34]  . 

Let  ft  be  the  space  of  continuous  functions  on  [0,  1]  to  Rn,  (wfc)  the  family  of 
coordinate  functions  and  F°  = a{w  , s <_  t}.  Let  P be  Wiener  measure  on  (ft,  F°)  and 

u S 1 

F be  the  completion  of  F°t  with  null  sets  of  F°.  Suppose  O : [0,  l]x  ft  -*-Rnxn  is  a 

t u 1 

matrix-valued  function  such  that 


(i)  is  Ffc-  predictable 


(4.1)  (ii)  |o  (t,  x)  - a { t,  y) I < < 0|u£t|  xs  - y£ 


(iii)  a(t,  x)  is  non-singular  for  each  (t,  x)  and  . (0~(t,  x)l ^ ; < K 

(Here  k is  a fixed  constant,  independent  of  t,  i,  j).Thcn  there  exists  a unique 
strong  solution  to  the  stochastic  differential  equation 


dx.  = a(t,  x ) dw  , x 6Rn  given, 
t t o 

Now  let  U be  a compact  metric  space,  and  f:  [0,  1]  x C x U -+  Rn  a given  function  which 
is  continuous  in  u6U  for  fixed  (t,  x)  6'  [0,  1]  x C,  an  F^-predictablo  process  as  a 
function  of  (t,  x)  for  fixed  utr'U, and  satisfies 


■ 


If  (t,  X,  U)  I < K(1  + PUP  |X  |) 


Now  let  V be  the  family  of  F -predictable  U-valued  processes  and  for  uGU  define 
L (u)  = exp ( f (01(s,x)  f (s,x,uc) ) 'dw^  - 1/2  f |o1f|2ds) 


The  Girsanov  theorem  as  given  in  §3  above  generalizes  easily  to  the  vector  case,  and 
condition  (4.2)  implies  the  vector  version  of  Novikov's  condition  (3.10)  (see  (60, 
p.  221]).  Thus  EL^ (u)  = 1 and  defining  a measure  by 


^=L1(U) 

we  see  that  under  P the  process  x^  satisfies 
u t 


(4.3)  dx  = f(t,x,u  )dt  + 0(t,x)dw 


where  w^  is  a P -vector  Brownian  motion.  The  cost  associated  with  u6U  is  now 
t u 


(4.4)  J(u)  = E^lf  c (t,x,ufc)dt  + $(x1)] 


where  c, $ ire  bounded  measurable  functions  and  c satisfies  also  the  same  condition  as 

f. 

It  is  clear  that  O must  be  non-singular  if  weak  solutions  are  to  be  defined  as 
above  (cf.  the  uniform  ellipticity  conditions  (2.5)),  but  an  important  class  of 
"degenerate"  systems  is  catered  for,  namely  those  of  the  form 

(4.5)  dx^  = f1(t,x^,  x2)dt 

(4.6)  dx2  = f2(t,x^,x2,u  )dt  + CT(t,xJ;,x2)dw 

t t:  t t t:  t t. 

— 112 
where  O is  nonsingular  and  f is  Lipschitz  in  x uniformly  in  (t,x  ) . Then  (4.5)  has 

12  t p 

a unique  solution  xfc  = X^.(x  ) for  each  given  trajectory  x , and  (4.6)  can  be  rewritten 
as 

dx2  = f2  (t,Xt(x2)  ,x2,ufc)dt  + o(t,xt(x2)  ,x2)dwfc 

which  is  in  the  form  (4.3).  This  situation  arises  when  a scalar  n'th-order  differen- 
tial equation  is  put  into  lst-order  vector  form. 

Fix  te[0,l]  and  define  the  conditional  remaining  cost  at  time  t as 


|h  = E [f  cU(x,s)ds  + 4>(x  )|f  ] 
t u J 1 t 


(Here  and  below  we  will  write  c(x,s,u  ) as  cU(x,s)  or  cU,  and  similarly  for  f ) . It 

s s 

is  seen  from  the  formula  (3.6)  that  \p^  only  depends  on  u restricted  to  tile  interval 
[t,l]  and  since  all  measures  P^  are  equivalent  the  null  sets  up  to  which  \Jju  is  defined 
are  also  control- independent;  in  fact  ijAj  is  a well-defined  element  of  L^(ft,Ft,P)  for 
each  u BU . Since  is  a complete  lattice  we  can  define  the  lattice  infimum 


“t  ■ oh  *? 


i 


as  an  F^-measurable  random  variable.  This  is  the  value  function  (or  value  process) 
It  satisfies  the  following  principle  of  optimality , originally  due  to  Rishel  [68] : 
for  each  fixed  u SU  and  0<t<T<l/ 


J < E if  cUdslFl  + E [W  I F ] 
t — nj  s 1 t u T 1 t 


Hie  proof  of  this  depends  on  the  fact  that  the  family  [ip^  : ue£/]  has  the  "£  -lattice 
property":  see  §5  below.  Now  define 


u f t u 
Mt  =1  Cs 


ds  + W. 


This  has  the  same  interpretation  as  in  (2.6)  above.  Note  that  since  Xg  is  assumed  to 
be  a fixed  constant, 


Mo  = wo  55  M J(v) 


M ? = / cUds  + $ (x. ) = "sample  cost" 

1 */0  s 1 . 

The  statement  of  the  principle  of  optimatlity  is  now  exactly  as  in  (2.8).  Firstly 
(4.7)  implies  that  is  a P^- submartingale  for  each  u.  Now  if  is  a P^-martingale 
then  E = E^M^  which  implies  u is  optimal  in  view  of  (4.8),  while  if  u is  optimal 
then  for  any  t, 


•T  = E if  cUds  + i|jU] 
o u y s rtJ 


Now  for  any  control  we  have  from  (4.7) 


■J.  < E [f  cUds  + W ] 
0 — u*£  s t 


and  hence 


Eu[wt  - V > 0 • 

But  by  definition  W <_  ^ a.s.;  thus  W = a.s.  and  therefore  M^  = E (M^|f.  ] . So 

U U u L U 12  JL  u 

M is  a martingale  if  and  only  if  u is  optimal. 

Fix  u HU.  A direct  argument  shows  that  the  function  t-+EM^  is  right  continuous, 

and  it  follows  from  [63,  VI  T4]  that  M^  has  a right-continuous  modification.  The 

conditions  for  the  Meyer  decomposition  [63,  VII  T31]  are  thus  met,  so  there  exists 

a unique  predictable  increasing  process  with  Ag  = 0 and  a martingale  N^  such  that 

u u u 

\ = W0  + \ + Nt 

We  now  want  to  represent  the  martingale  N^  as  a stochastic  integral.  If  the  0-fields 
Ft  were  generated  by  a Brownian  motion  then  this  representation  would  be  a standard 
result  [15],  [52],  [60],  but  here  (4.3)  is  only  a weak  solution,  so  (w”)  does  net 

necessarily  generate  (F^) . Nevertheless  it  was  proved  by  Fujisaki,  Kallianpuf  and 
Kunita  [43]  (sec*  also  (251,  [GO))  that  allK  -martingales  arc  in  fact  stochastic  in- 


tegrais  of  w“,  i.e.  there  exists  an  adapted  process  g such  that 

rt  2 

J | gg|  ds  < °°  a.s. 


id 

1.9)  N^  = / g a dwU 

t s s s 


From  the  definition  of  M we  now  have 


(4.10)  W = + f 9 0 dw11  + AU  - f cUds 

t 0 J.  s s s t .A  s 


Now  take  another  control  uSU . By  definition 


7ft 

V 

c ds  + VT 
n 3 


and  hence,  using  (4.3)  and  (4.10)  we  get 


(4.11) 


v f t v rt 

Mt-  = Wn  + / 9 0 dw  + A + / (H  (v  ) - 

t O^sss  t J s s 


H (u  ) )ds 
s s 


where 


(4.12)  H (u ) = g f(s,x,u  ) + c (s,x,u  ) 
s s s s s 


Now  (4.11)  gives  a representation  of  as  a "special  semimartingale"  (=  local  martin- 
gale + predictable  bounded  variation  process)  under  measure  P and  it  is  known 

u 

that  such  a decomposition  is  unique  [64,IV32  ].  But  we  know  that  MVis  a submartingale 
with  decomposition 


(4.13)  MV=  WQ  + N^+  A^ 


so  the  terms  in  (4.11),  (4.13)  must  correspond.  In  particular  this  shows  that 
the  integral  g in  (4. 9 ) does  not  depend  on  the  control  u.  We  can  now  state  some 
conditions  for  optimality. 


(4.14)  A necessary  condition.  If  u *BU  is  optimal  then  it 
(a.s.  dp  x dt)  the  Hamiltonian  H of  (4.12) 

S 


minimizes 


Indeed,  if  u*  is  optimal  then  A^  = 0.  Referring  to  (4.11)  with  u = u*  we  see 
that  (4.14)  is  just  the  statement  that  the  last  term  in  (4.11)  is  an  increasing 
process . 

(4-15)  A sufficient  condition  for  optimality.  For  a given  control  u*,  defined  the 

* 

P -martingale 

p?  ■ 

Then  u*  is  optimal  if  for  any  other  u eU  the  process 


Xt=Pt+  / (cs  * cs*)ds 


ic  a P - eubmavtinnale . 
u 


This  in  evident  since  then 


j(u*)  = i“  = - J(u)- 

We  can  recast  (4.15)  as  a local  condition:  since  is  a martingale, p * has  a 


representation 


p£  = J(u*)  + 


, u* 

/ g a dw 

Jn  S S S 


Now  suppose  that 


(4.16)  Ht^ut^  — Ht^v^  a.e.  for  all  vSu 

where  H is  as  in  (4.12)  but  with  g replacing  g.  Then  a calculation  similar  to  (4.11) 
shows  that  1^  is  a local  P^-submartingale  for  any  uGU ; since  1^  = J(u*),  this 
implies  that  if  is  a sequence  of  localizing  times  then 

V1!™  i i Jl“*, 

n 

But  the  process  1^  is  uniformly  bounded  and  I*1  -»■  I?  as  n-*»,  so  that 


Wt  1 * 

n 

Thus  (4.16)  is  a sufficient  condition  for  optimality  and  it  is  easily  seen  that  if 

u* 

it  is  satisfied  then  p£  = and  gfc  = gfc,  a.e.  See  [1 l]  for  an  application. 

Since  the  process  g is  defined  independently  of  the  existence  of  any  optimal 
control  it  seems  clear  from  the  above  that  an  optimal  control  should  be  constructed 
by  minimizing  the  Hamiltonian  (4.12).  Under  the  conditions  we  have  stated,  an 
implicit  function  lemma  of  Benes  [1  ] implies  the  existence  of  a predictable  process 


ufc  such  that 


W = ^0  Vv)  a-e- 


Using  (4.11)  with  u = u gives 

_t 


MV  > W„  + f g a dwv  + Au 
t—  0 ’s  s s t 


and  hence,  taking  expectations  at  t=l, 

0 

(4.17)  E^a"  ) < J(v)  - WQ 

1b  show  u°  is  optimal  it  suffices,  according  to  the  criterion  (2.8),  to  show  that 
0 

A^  =0  a.s.  Here  we  need  some  results  on  compactness  of  the  sets  of  Girsanov  ex- 
ponentials, due  to  Benes  [ 2]  and  Duncan  and  Varaiya  [ 30]  . Let  A be  the  set  of 
Rn-valued  F^-predictable  processes  <(j  satisfying 

| $ ( t , x ) | _<  K(1  + SU£  |xs  I ) , (t,x)e[0,  1]  x JJ 

(thus  fU£4  for  ueU,  see  (4.2))  and  let 

D = {6  (i|>)  : <pGA ) 


where 


6(<j>)  = exp(  f (o_1<f>)  'dw  - 1/2  J |o_10|‘dt) 


then  BeneS'  result  is 

(4.18)  D is  a weakly  compact  subset  of  l^H.F.P)  and  l>0  a.s.  for  all  leD. 

Returning  to  (4.17)  we  can,  in  view  of  (4.8),  choose  a sequence  u ^6U  such  that 

J(u  ) i W.  and  hence  such  that  for  any  positive  integer  N, 
n 0 0 nO 

(4.19)  Eu  [a“  A N]  = E[5(fU  ) (a“  A N)  ]-*  0,  n -*» . 

Un  n 

In  view  of  (4.  18)  there  is  a subsequence  of  5 ( f ) converging  weakly  to  some  p eD; 
hence  from  (4.19) 


E [P  (A“  A N)  ] = 0 
1 0 

and  it  follows  that  A^  = 0 a.s.  We  thus  have: 

(4.  20)  Under  the  stated  conditions , an  optimal  policy  a* exists,  constructed  by 
minimizing  the  Hamiltonian  (4.12). 

Two  comments  on  this  result:  firstly,  it  is  possible  to  recast  the  problem  so 
as  to  have  a purely  terminal  cost  by  introducing  an  extra  state  x®  as  in  (2.9),  (2.10) 
However  it  is  important  not  to  do  this  here,  since  an  extra  Brownian  motion  w°  is 
introduced  as  well,  and  there  is  then  no  way  of  showing  that  the  optimal  policy  u° 


does  not  depend  on  \P  - i.e.  one  gets  a possibly  "randomized"  optimal  policy  this 
way.  Secondly,  the  existence  result  (4.20)  was  originally  proved  in  [ 2 1 and  [30] 
just  by  using  the  compactness  properties  of  the  density  sets.  However  they  were 
obliged  to  assume  convexity  of  the  "velocity  set"  f(t,x,U)  in  order  that  the  set 
DW)  = {6  (fU)  : u eU}  be  convex  (and  can  then  be  shown  to  be  weakly  closed) . Finally 
it  should  be  remarked  that  (4.20)  is  a much  stronger  result  than  anything  available 
in  deterministic  control  theory,  the  reason  being  of  course  that  the  noise  "smooths 
out"  the  process. 

A comparison  of  (2.3)  and (4. 12)  shows  that  the  process  plays  the  role  cf  the 
gradient  V (t,x  ) in  th®  Markov  case,  so  that  in  a sense  the  submartingale  decompo- 
sition theorems  are  providing  us  with  a weak  form  of  differentiation.  The  drawback 
with  the  martingale  approach  is  of  course  that  while  the  function  can  (in  prin- 
ciple) be  calculated  by  solving  the  Bellman  equation,  the  process  g^_  is  only  defined 
implicitly  by  (4.9),  so  that  the  optimality  conditions  (4.14)  (4.15)  do  not  provide 
a constructive  procedure  for  calculating  the  optimal  u^,  or  for  verifying  whether  a 
candidate  control  satisfies  the  necessary  condition  (4.14).  Some  progress  on  this 
has  been  made  by  Haussmann  [44],  but  it  depends  on  u (t,x)  being  a smooth  function 
of  xeft,  which  is  very  restrictive. 


is  Frechet  differentiable  as  a function  of  x€f2;  then  by  the  Riesz  representation 
there  is,  for  each  x6ft  an  Rn-valued  Radon  measure  such  that  for 


(x+y)  = (x)  + 


/ 

[0,1] 


y (s)  U (ds)  + O (|  I y | ) 


Since  u^  is  optimal  satisfies 


0 r t 0 

= J (u  ) + / g 0 dwU 
t Jr,  s s s 


theorem 


and  Haussmann  [45]  [46]  (see  also  [19])  shows  that,  under  some  additional  smoothness 

assumptions,  gfc  is  given  by 


’t'S1  / u‘{a*)i,{«,t)|p  ) 

U ] t , 1] 

where  'F  ( s , t ) is  the  (random)  fundamental  matrix  solution  of  the  linearized  equation 
corresponding  to  (4.3)  with  u = u°.  Ths  representation  gives,  in  some  cases,  an 
"adjoint  equation"  satisfiedby  gfc,  along  the  lines  originally  shown  by  Kushner  [ ] . 

Finally  let  us  remark  that  in  all  of  the  above  the  state  space  of  x is  Rn. 

Some  problems  - for  example,  control  of  the  orientation  of  a rigid  body  - are  more 
naturally  formulated  with  a differentiable  manifold  as  state  space.  Such  problems 
have  been  treated  by  Duncan  [29]  using  versions  of  the  Girsanov  theorem  etc.  due  to 
Duncan  and  Varaiya  [31] . 


5.  GENERAL  FORMULATION  OF  STOCHASTIC  CONTROL  PROBLEMS 

The  first  abstract  formulation  of  dynamic  programming  for  continuous-time  stochas- 
tic control  problems  was  given  by  Rishel  [68]  who  isolated  the  "principle  of  optimality" 
in  a form  similar  to  (4.7).  The  submartingale  formulation  was  given  by  Striebel  [70] 
who  also  introduced  the  important  "C-lattice  property."  Other  papers  formulating 
stochastic  control  problems  in  some  generality  are  those  of  Boel  and  Varaiya  [11], 

Memin  [61],  Elliott[37]  [38],  Boel  and  Kohlmann  [9  ] [10],  Davis  and  Kohlmann  [23] 

and  Br^maud  and  Pietri  [14]. 

We  shall  sketch  briefly  a formulation,  somewhat  similar  to  that  of  (2.7),  which 
is  less  general  than  that  of  Striebel  [70]  but  sufficiently  general  to  cover  all  of 
the  applications  considered  in  this  paper. 

The  basic  ingredients  of  the  control  problem  are 

(i)  A probability  space  (Q,F,P) 

(ii)  Two  families  (Ffc) , (Yt)  (0<t<l)  of  increasing,  right-continuous,  com- 
pleted sub-a-fields  of  F,  such  that  C Ffc  for  each  t. 

(iii)  A non-negative  F^-measurable  random  variable  4*. 

(iv)  A measurable  space  (U,E) 

(v)  A family  of  control  processes  {lr~ , 0<s<t<l} 

s 

Each,  control  process  ue!/*'  is  a Y -predictable  U-valued  function  on  ]s.t]  * fi  . 

s t 


The 


family  { £/c.}  is  assumed  to  be  closed  under 

restriction:  u eU  ■*  u|  ,eU  for  s<x<t 
s { s , t ] s 

• T t t 

concatenatvon:  u eU  , veU  =>we(/  where 

STS 


w(a'aj>  * { v(a,< 


ae]s,i] 

06]T,t] 


finite  mixing : u,ve(/g,  A6Yg  -►  we(A  where 


W (O/Cd) 


u(a,co) , td€A 


f u(a,co) , 
l v(o,oj)  , 


We  denote  U = (In  most  cases  U will  consist  of  all  predictable  U-valued  processes, 

but  (5.1)  is  the  set  of  conditions  actually  required  for  the  principle  of  optimality 

below).  A control  u8(/'^  is  assumed  to  determine  a measure  P on  (ft,F.)  which  is 

0 u t 

absolutely  continuous  with  respect  to  p|  such  that  P | = plp  an^  su°h  that  the 

Ft  U F 0 

assignment  is  compatible  in  the  sense  that  if  u6(/g,  s<  t and  v = u|  jq  sj  (so  that 

veyf)  then  P = P I . If  ueUt  and  X is  an  F -measurable  random  variable,  then  E X 
•'0  v u‘  st  u 

F 

s 

denotes  expectation  with  respect  to  measure  P . We  finally  assume  that  <«  for 
all  ueU  and  the  problem  is  then  to  choose  u6 U so  as  to  minimize  J (u)  - E $. 

li 

The  value  process  corresponding  to  u8(/g  is 
(5-2)  Wt  = v\  EvI$IV 

where  " A^"  denotes  the  lattice  infimum  in  L^(ft,Yt,P),  taken  over  all  vet'  such  that 

^ u 

v][0  = u.  Note  that,  in  contrast  to  the  situation  in  §4,  Wfc  is  in  general  not 

control- independent.  We  nevertheless  have  a result  analogous  to  (2.8),  namely 

(5.3)  wj  is  a submartingale  for  each,  u eU  and  is  a martingale  if  and  only  if  u 
is  optimal. 

Note  that  by  inclusion  and  using  the  compatability  condition,  for  any  t > t 

W?  < A E [4>|y.  ] = A E [E  [<J>|y  ] IyJ 
tl  **  V,  T V t V/X  u v X t 

so  that  the  first  statement  of  (5.3)  is  equivalent  to  the  assertion  that  vA^_  and 

[ • | Y t ) may  be  interchanged,  and  according  to  Striebcl  [70]  (see  also  [2G  ] for  a 

summary)  this  is  possible  if  the  random  variables  E^ l O | V ^ ) have  the  £-lattioe  property : 

if  v ,v  ei'J  then  there  exists  v 60'}  such  that,  with  v.  denoting  the  concatentation  of 
1 2 t 3 t l 

u and  v . , 

l 


E-  [<t>  | YtJ  < E-  [<1>|yJ  A E-  [4>|Yt]  + e a.s. 


Now  it  is  evident  that  under  assumptions  (5.1)  the  set  {E^  [4>|  Y^  ] }has  the  0-latti.cc 
property,  because  given  v^,  v^  as  above  one  only  has  to  define 


A = {w  : E-  [4-|  V ] < E-  [$|  Y ] 
1 2 


and,  for  t£] t, 1] , 


v3(t,w) 


V^l/Cd),  tuGA 


'v2(t,u)»  U)6A 
Then  (5.4)  holds  with  E=0. 

It  is  clear  from  the  definition  (5.2)  that  u is  optimal  if  is  a P^-martingale 
while  conversely  if  u is  optimal  then  for  any  t6[0,l] 

(5.5)  Eu[wJ)  = ig£  J(v)  = J(u)  = Eu[Eu[*|»t]] 

But  by  the  submartingale  property  E^ [W^]  < E^tW^]  and  this  together  with  (5.2)  and 
(5.5)  implies  that  W^E^I^Y^] , i'e>  Wt  :’'S  a Pu  _inartin9ale • 

Statement  (5.3)  is  a general  form  of  optimality  principle  but  its  connection 
with  conventional  dynamic  programming  is  tenuous  as  there  is  a different  value 
function  for  each  control,  reflecting  the  fact  that  past  controls  can  affect  the 
expectation  of  future  performance.  This  is  suggestive  of  Feldbaum's  "dual  control" 
idea,  namely  that  an  optimal  controller  will  act  so  as  to  "acquire  information"  as 
well  as  to  achieve  direct  control  action. 

The  postulates  of  the  general  model  above  are  not,  as  they  stand,  sufficient 
to  endure  that  there  is  a single  value  function  if  Y = (complete  information) . 


L (u)  = E [ 

tv  ; ldP 


Now  fix  se[0,l]  and  for  s<t<l  define 


Lt(u,v)  = 


Lt(u)/Ls(v) 


if  L (v)  > 0 
s 

if  L (v)  = 0 
s 


then  Lfc(u,v)  is  a positive  martingale  and  Lg(u,v)  = 1.  Then  the  following  hypothesis 
ensures  that  there  is  a process  such  that  in  case  Yfc  = Ffc : 


(5.7)  For  any  v6(/,  and  u^,uoeU  such  that  u^i 


Lt(ui»v)  = l»t (u^ , v)  for  all  te  ]s,l] 


we  have 


]s,l] 


See  [6i,  Lemma  3.2].  Clearly  the  densities  Lfc(u)  of  §4  above  satisfy  (5.7) 

A minimum  principle  - complete  observations  case 

If  we  are  to  use  the  principle  of  optimality  (5.3)  to  obtain  local  conditions 
for  optimality  in  the  form  of  a minimum  principle  it  is  necessary  to  be  more  specific 
about  how  the  densities  Lf(u)  are  related  to  the  controls  u QU.  This  is  generally 
through  a transformation  of  measures  as  described  in  53  above.  A general  formulation 
will  be  found  in  Elliott's  paper  [38]  in  this  volume,  but  to  introduce  the  idea  let 


us  consider  the  following  rather  special  set-up. 

Suppose  Y = for  each  t,  and  let  be  a given  F^martingalo  with  almost 
all  paths  continuous.  Now  take  a function  tf>  : [ 0 , 1 ]*  ft  * l)-*-  R such  that  (J  is  a predic- 
table process  for  each  uGU  and  continuous  in  u for  each  fixed  (t,U)')  , and  for  u 6U  let 
<j>u  denote  the  predictable  process  <J>U(t,t))  = <j>(t,U)..u(t,u) ) . We  suppose  that  for  each 


(5.8)  E exp (1/2 


f«: 


) d<M>  ) < ® 
s 


and  that  the  measure  P is  defined  by 

u 


p 

f-  = E(f<t>UdM)1 


(see  3).  From  (3.7),  condition  (5.8)  ensures  that  is  a probability  measure  and 
that  P^  *P.  Now  Lfc(u)  (defined  by  (5.6))  satisfies  the  equation 


L (u)  = f L (u)({)U 
t S s 


The  uniqueness  of  the  solution  to  this  equation  shows  that  condition  (5.7)  is  satis- 
fied, and  hence  that  there  is  a single  value  process  W , which  can  be  shown  to  have 
a right-continuous  modification  [61],  assuming  the  cost  function  is  hounded.  Then 
for  any  u6 U,  has  the  submartingale  decomposition 

(5.9)  Wfc  = WQ  + N^  + a” 

u . u 

where  is  a P^-martmgale  and  A^  a predictable  increasing  process.  According  to 
the  translation  theorem,  the  process 

(5.10)  dMUt  = dMfc  - 4>Ud<M>t 

is  a continuous  P^-martingale.  Decompose  N^  into  the  sum 

u — u ~ 

Nfc  = Nfc  + Nfc 

where  N^  is  in  the  stable  subspace  generated  by  (see  [64])  and  is  orthogonal 
to  this  stable  subspace.  There  is  a predictable  process  g such  that 


— u / u 

N = / g dM 

t Jn  3 s 


Now  consider  another  admissible  control  v.  Using  (5.9),  (5.10),  we  see,  as  in  (4.11), 
(4.12)  above  that  W can  be  written 


/t  j*  t 

g dMU  + N + / g (t  - <|> U)  d<M>  + A^ 
ss  t/Js’s  s s t 


Now  5 is  a P -mavtinoale , since  the  Radon-Nikodym  derivative  E [dP  /dP  If  ] is  in 
t U U V U t 

the  stable  subspace  generated  by  M (see  [37],  [38])  and  hence,  by  the  uniqueness 

of  the  semi-martingale  decomposition  (5.9)  we  have 


= f.  g (4>U  - $U)d<M>  + A^ 

t Js  s s s t 


u u 

Since  is  an  increasing  process  and  0 if  u is  optimal , we  have  the  following 


minimum  principle: 

(5.11)  If  u eU  is  optimal  and  v is  any  admissible  control  then  for  almost  all  gj 

g <Ms,io,u  ) < g di(s,u),v  ) a.e.  (d<M>  ) 
s s — s s s 

In  particular  if  U consists  of  all  predictable  u-valuad  processes  then 


gs<Ji  (s,u,us)  = m£y  gs£(s,ui,v) 

The  importance  of  this  type  of  result  is  that  no  martingale  representation 
result  is  required,  since  the  "orthogonal  martingale"  plays  no  role  in  the  optimal- 
ity conditions  (things  are  somewhat  more  complicated  if  the  basic  martingale  mfc  is 
not  continuous) . 

Partial  observations  case 

Further  progress  in  the  case  when  Yfc  / F appears  to  depend  on  representation 
theorems  for  Y^-martingales,  although  possibly  a development  similar  to  the  above 
could  be  carried  out.  For  each  u€U  the  P^-submartingale  is  decomposed  into  the 
sum  of  a martingale  and  an  increasing  process.  In  Memin's  paper  it  is  assumed  that 
all  (Y^.P) -martingales  have  a representation  as  a sum  of  stochastic  integrals  with 
respect  to  a continuous  martingale  and  a random  measure.  It  is  shown  in  [48]  that  a 
similar  representation  then  holds  for  (Y^P^) -martingales  since  Pu<<P.  Using  this 
some  somewhat  more  specific  optimality  conditions  can  be  stated,  but  these  do  not 
lead  to  useful  results  as  no  genuine  minimum  principle  can  be  obtained.  Rather  than 
describe  them  we  revert  to  the  stochastic  differential  equation  model  of  §4  for  which 
better  results  have  been  obtained. 


6.  CONTROLLED  STOCHASTIC  DIFFERENTIAL  EQUATIONS  WITH  PARTIAL  INFORMATION 


Returning  to  the  problem  of  §4,  let  us  suppose  that  the  state  vector  is  divided 

into  two  sets  of  components  x^  = (y^,z^_)  of  which  only  the  first  is  observed  by  the 

controller.  Define  Y = a{y  , s<t}.  Then  the  class  of  admissible  controls  is  the 

t s — 

set  N of  Yt~adapted  processes  with  values  in  U.  The  objective  is  to  choose  u€.V  so  as 
to  minimize  J(u)  given  by  (4.4).  Following  Elliott  [34]  we  will  outline  a necessary 
condition  for  optimality.  Thus  we  suppose  that  u*6;V  is  optimal  (and  write  c*,  E^ 


instead  of  c 


E . , etc. ) . Let 
u*  , 


\>t  = c*ds  + lFt) 


and  for  any  u €N  define 


NU  = f cUds  + D* 
t s 


Then  N*  is  an  (F^.’^j-mart-ingule  and  it  is  easily  shown  that 

(G  x)  (i)  is  a <yt,P*) -martingale 

(ii)  E*[NU!y  ] < E [E  [tJU  , |f  ) |y.  ] for  any  uEU  and  h > 0 
t't  — * u t-uvt't 


(C.l) 


As  in  5 4 , wo  can  represent  N*  as  a stochastic  integral  with  respect  to  the  Brownian 

n * 

motion  w*  = w , i.e.  there  exists  an  F -adapted  process  g*  such  that 

L t t t 

r 

(6.2)  N*  = i>*  + I <3*0  dw* 

t T0  s s s 

Using  an  argument  similar  to  that  of  (4 . 11) - (4 . 12)  we  see  that  can  be  written 


(6.3)  Nut  = <[>*  + 


/g*0  dwU  + f AH*(u)ds 
s s s ± s 


where 


AH* (u)  = [g*f (s,x,u  ) + c(s,x,u  )]  - [g*f(s,x,u*)  - c(s,x,u*)] 
s s s s s s s 


It  now  follows  from  (6.1)  (ii)  and  (6.3)  that 

-t+h 

(l/h)E*[Eu(/  AH*(u)ds|Ft) | Yt)  > 0 


A rather  delicate  argument  given  in  [34]  shows  that  taking  the  limit  as  hlO  gives 
[AH*  (u) | Yfc]  > 0.  We  thus  obtain  the  following  minimum  principle: 

(6.4)  Suppose  u *eN  is  optimal  and  ue //.  Then  there  is  a set  Tc[0,l]  of  zero 
Lebesgue  measure  such  that  for  t0T 

E*  [g*f (t,x,u*)  + c(t,x,u*) I Yfc]  <_  Et [g*f (t,x,ufc)  + c (t,x,ut) | Yt]  a.s. 
where  g*  is  the  process  of  (6.2). 

This  is  a much  better  result  than  the  original  minimum  principle  (theorem  4.2 
of  [25l) since  the  optimal  control  minimizes  the  conditional  expectation  of  a Hamil- 
tonian involving  a single  "adjoint  process"  g*.  A similar  result  (including  some 
average  value  state  space  constraints)  was  obtained  by  Haussmann [44]  using  the  Gir- 
sanov  formulation  together  with  L.W.  Neustadt's  "general  theory  of  extremals." 

It  is  shown  in  [39]  that  a sufficient  condition  for  optimality  is  that  an 
inequality  similar  to  (6.4)  but  with  E replacing  E.  should  hold  for  all  admissible  u. 

The  disadvantage  of  the  types  of  result  outlined  above  is  that  they  ignore  the 
general  cybernetic  principle  that  in  partially  observable  problems  the  conditional 
distribution  of  the  state  given  the  observations  constitutes  an  "information  state," 
on  which  control  action  should  be  based.  In  other  words,  the  filtering  operation  is 
not  explicitly  brought  in.  Although  there  is  a well-developed  theory  of  filtering 
for  stochastic  differential  equations  [42],  [60],  it  turns  out  to  be  remarkably  dif- 

ficult to  incorporate  this  into  the  control  problem.  A look  at  the  "separation 
theorem"  of  linear  control  [ is ] , [78],  [41],  chapter  7]  will  show  why.  The  separation 
theorem  concerns  a linear  stochastic  system  of  the  form 

,,  dx  = Ax  dt  + £(u  )dt  + Gdw^U 

(6.0)  t t t t 

dyt  = Fxtdt  + P'l/2dwtU 

where  w^U,w^U  are  independent  vector  Brownian  motions,  the  distribution  of  the  initial 
state  xQ  is  normal,  and  the  coefficient  matrices  can  be  time-varying.  It  is  assumed 
that  GG’  and  R are  symmetric  and  strictly  positive  definite,  that  the  controls,  u^ 


take  values  in  a compact  set  U and  that  the  function  B is  continuous.  The  solution 
of  (6.5)  for  a given  Yt-adapted  control  policy  ufc  is  then  defined  by  standard  appli- 
cation of  the  Girsanov  technique  and  the  (non-quadratic)  cost  is  given  by 

J(u)  = E [ f c(t,x  ,u  )dt  + <J>(x  )] 
u *T)  e r i 

It  is  shown  in  [24]  that  the  conditional  distribution  of  x given  Y is  normal , with 

/\  t tl 

mean  x and  covariance  E given  by  the  Kalman  filter  equations: 


dxfc  = Axtdt  + B(ufc)dt  + E^F ' R 

*0  = Exo 


. d-V2, 


(6.7)  E = AE  + EA'  + GG'  - EF'R  FE 
E(0)  = cov(xQ) 

Here  is  the  normalized  innovations  process 


s -f 


R (dy  - Fx  ds) 
s 


-1/2 

which  is  a standard  vector  Brownian  motion.  Let  us  denote  K(t)  = E^'R  , and  let 
n(»,x,t)  be  the  normal  density  function  with  mean  x and  covariance  E^..  Now  define 

6 (t,x,u)  = J c(t,  £u)n(  £,x,t)d  £ , $(x)  = J*$(  §n(  £x,t)d  £ 

n „n 

R R 

Then  the  cost  J (u)  can  be  expressed  as 


(u)  = E [J  8{t,S  ,u  )dt  + 5>(x  ) 
u < t:  t.  1 


The  original  problem  is  thus  seen  to  be  equivalent  to  a "completely  observable" 
problem  (6.6),  (6.8)  with  "state"  x^_  (this  characterizes  the  entire  conditional  dis- 

tribution since  the  covariance  E(t)  is  non- random ) . This  suggests  studying  "separated 
controls"  of  the  form  ufc  = i|i(t,xt)  for  some  given  measurable  function  ip:  [0,1]  x Rn-*-  U. 
However,  such  controls  are,  in  general,  not  admissible:  admissible  controls  are  speci- 
fied functionals  of  y,  whereas  the  random  variable  x depends  on  past  controls 
{ug,  s<t} . One  way  round  this  difficulty  is  to  consider  (6.6)— (6.8)  as  an  independent 
problem  of  the  type  considered  in  §4,  i.e.,  to  define  the  solution  of  (6.6)  by  Girsanov 
transformation  on  a new  probability  space,  for  separated  controls  u(t,x).  However 
we  then  run  into  the  fresh  difficulty  that  weak  solutions  of  (6.6)  are  only  defined 
if  the  matrix  K(t)K'(t)  is  strictly  positive  definite,  which  cannot  happen  unless 
the  dimension  of  y^  is  at  least  as  great  as  that  of  x - a highly  artificial  condi- 
tion. If  this  condition  is  met  then  we  can  apply  (4.17)  to  conclude  that  there 
exists  an  optimal  separated  control,  and  an  extra  argument  as  in  [ 10]  shows  that  its 
cost  coincides  with  inf  ,J(u).  If  dim(y  ) < dim(x.)  then  some  form  of  approximation 

Utii/  t t 

must  be  resorted  to. 

With  these  elementary  obstacles  standing  in  the  way  of  a satisfactory  martingale 
treatment  of  the  separation  theorem,  it  is  not  surprising  that  a proper  formulation 
of  information  states  for  non) inear  problems  has  not  yet  been  given.  It  is  possible 


that  the  Girsanov  solution  concept  is  still  too  strong  to  give  existence  of  optimal 
controls  for  partially-observable  systems  in  any  generality. 


7. 


OTHER  APPLICATIONS 


This  section  outlines  briefly  some  other  types  of  optimization  problems  to  which 
martingale  methods  have  been  applied.  The  intention  is  merely  to  indicate  the  martin- 
gale formulation  and  not  to  give  a survey  of  these  problems  as  a whole:  most  of  them 
have  been  extensively  studied  from  other  points  of  view  and  the  associated  literature 
is  enormous.  Nor  is  it  claimed  that  the  martingale  approach  is,  in  all  cases,  the 
most  fruitful. 

7.1  Jump  processes 

A jump  process  is  a piecewise-constant  right-continuous  process  x^  on  a probabil- 
ity space  (ft,F,P)  with  values  in,  say,  a complete  separable  metric  space  X with  Borel 
O-field  5.  It  can  be  identified  with  an  increasing  sequence  of  times  { } and  a 
sequence  of  X- valued  random  variables  {z^}  such  that 


*t  * 


telVW 


t>T 

— oo 


(Generally  T =°°  a.s.  in  applica- 


where  T = lim  T and  z is  a fixed  element  of  X. 
oo  n n 00 

tion.)  Jump  processes  are  useful  models  in  operations  research  (queueing  and  inven- 
tory systems)  and  optical  communication  theory,  among  other  areas.  Their  structure 
is  analysed  in  Jacod  [47],  Boel,  Varaiya  and  Wong  [12]  and  Davis  [17].  A jump  pro- 
cess can  be  thought  of  as  an  integer  valued  random  measure  y on  E = R+  x X defined 
by 


y (io,dt,dz)  = E 6 


n (T  (to)  X (to) ) 
n , n 


(dt,dz) 


s<t} 


where  6 is  the  Dirac  measure  at  e£E.  Now  let 
e 

f = a{y  (]o,s]  xa),  s<t,  ASS}=a{xs, 

and  let  P be  the  Ft~predictable  a-field  on  R+  xft.  A random  measure  y is  predictable 
if  the  process 

(7.1)  f g(to,s,z)y  (oj,ds,dz) 

]0,t]  xx 

is  predictable  for  all  bounded  measurable  functions  g on  (ft x r+ x x,  P ^S) . The  fun- 
damental result  of  Jacod  [47  ] is  that  there  is  a unique  predictable  random  measure 
v such  that 

(7.2)  E[  J"  g (s,z)y  (ds,dz) ] = E if  g (s , z) V (ds ,dz) ] 

E E 

for  all  g as  above,  v is  also  characterized  by  the  fact  that  for  each  AeS,  v(]0,t]  x a) 
is  the  dual  predictable  projection  (in  the  sense  of  Dellachorie  [27  ])  of  y(]0,t],A), 
i.e.  the  process 

q ( t , A)  = y ( ] 0 , t ] x A)  - v(J0,t]  x A) 

X' 


T 


is  an  Ffc  --martingale . Tin  explicit  construction  for  v in  terms  of  the  distributions 
of  the  {T  ,'L  ) sequence  is  given  in  [2.3].  We  will  denote  by  Jg  dq  integrals  of 
the  form  ( J*  g dn  - J*g  dv)  where  fg  dp  and  J"g  dv  are  defined  as  in  (7.1)  then 
the  process 

g-qt  = / X9  dq 

]0,t] 

is  an  F^martingale  for  a suitable  class  of  predictable  integrands  g,  and  the  mar- 
tingale representation  theorem  [12],  [ 17] , [47]  states  that  all  Ft~martingales  are 
of  this  form  for  some  g. 

Denote 


Afc  = v(]0,t]  x X) 

For  each  to  this  is  an  increasing  function  of  t and  evidently  the  measure  it  defines 
on  R+  dominates  that  defined  by  v(]0,t]  x A)  for  any  A eS.  Thus  there  is  a positive 
function  n(to,s,A)  such  that 

(7.3)  V ( ] 0,  t]  x a)  = f n(to,s,A)dA 

10, t]  s 

Owing  to  the  existence  of  regular  conditional  probabilities  it  is  possible  to  choose 
n so  that  it  is  measurable  and  is  a probability  measure  in  A for  each  fixed  (s,<o). 

The  pair  (n,A)  is  called  the  local  description  of  the  process  and  has  the  interpre- 
tation that  A is  the  hitegrated  jump  rate:  roughly,  dA  t P [x  ? x |f  ] and 

C S S t CIS  s s 

j)  (u, s, • ) is  the  conditional  distribution  of  x given  that  x ^ x 

s s s~ 

Optimization  problems  arise  when  the  local  description  of  the  process  can  be 
controlled  to  meet  some  objective.  This  is  normally  formulated  [11],  [22]  by  abso- 
lutely continuous  change  of  measure,  as  in  53:  we  start  with  a "base  measure"  P on 
(ft,F^)  with  respect  to  which  the  jump  process  has  a local  description  (n,A  ) and  define 

a new  measure  P by 
u 


dP 

dF  = *(mU)i 

where  m is  a (P,Ft)  martingale.  Under  P^  the  process  xfc  has  a different  local  des- 
cription which  can  be  identified  by  the  translation  theorem  ( . ).  More  specifically, 
it  is  supposed  that  the  admissible  controls  U consist  of  F^-predi stable,  (U, 5) -valued 
processes  and  that  a real'-valued  measurable  function  (J>  on  (R  xfl  x x xu,  P*S*E)  is 
given.  Denoting  d>U(t,to,  z)  = <J>  (t,u,  z ,u  (t  ,to) ) for  u ell,  mU  is  defined  by 

m^(oj)  = f c|>U(s,a',z)q(uj,ds,dz) 

] 0,  t]x  x 

The  Doleans-Dade  exponential  ( . ) then  takes  the  specific  form 

E(mU)  =oxp(-/*  f 4>Udn  dA^)  H (l+v}iU(T.  ,Z. ) -AA  fij>U  (T.  ,z)n(T.  ,dz)) 

t x Tpit  11  1 ” 1 1 

II’  (1-AA  J cfiU(s,z)n(s,dz) ) 

s <t  s X 


i X 


x 


where  Ac  is  the  continuous  part  of  A and  the  second  x^oduct  is  taken  over  the  countable 

set  of  s such  that  AA  >0  and  s0  {T,,T  ,...}.  Assuming  that  EA'(Mu).  = 1,  is, 

s 12  It 

under  measure  P , a jump  process  with  local  description 


/ di.AuJ)” 


dn) v (ds , dz) 


) 0, t]  X 


n (s,A)  = A 


/(1+^s  - AA  J<pU dn)  n(s,dz) 

A S S X 

(1  + 4>u  - AA  J",pUdn)  n(s,dz) 


See  [22],  [36]  for  details  of  these  calculations  and  conditions  under  which  EE (m  ) =1. 
Generally,  only  weak  conditions  on  <J>  are  needed  to  ensure  that  is  a probability 
measure  on  for  each  n and  hence  on  F . If  T = 00  a.s.  (P)  then  extra  conditions 

T T co 

n oo 

on  A can  be  imposed  to  ensure  that  T = oo  a.s. (P  ) and  then  P is  a probability  on 
T oo  u u 

F for  each  fixed  t;  see  [77] . Let  us  suppose  that  the  control  problem  is  to  choose 
u 6U  so  as  to  minimize 
J(u)  = Eu<? 

where  $ is  a bounded  F^-measurable  random  variable.  Then  the  problem  is  in  the 
general  framework  of  §5  and  furthermore  we  have  a martingale  representation  theorem 
analogous  to  that  of  the  Brownian  case.  Thus  local  conditions  for  optimality  can 
be  obtained  by  following  the  steps  of  §4. 

Suppose  u *eU  is  optimal.  Then  by  the  martingale  representation  theorem  there 
is  an  integran  g such  that 


(7.5)  E [$ I F 


tJ  = J (u*)  + J 


g(s,z)q* (ds,dz) 


]0,t]  x X 


where  q*  = p— v*,  and  v*  is  the  dual  projection  of  p under  measure  PA  (cf.  (7.2)). 
Now  let  u BU  be  any  other  control;  then  we  can  rewrite  (7.3)  in  the  form 

(7.6)  Ea  [<t>  | Ffc]  = J(u*)  + f g dqU  + f g(dVU  - dv*) 


1 0,  t]  x X 


]0,t]  XX 


According  to  the  criterion  (5.3  ),  E#[4>|Ft]  is  a P^-submartingale , and  hence  the 
last  term  in  (7.5)  must  be  an  increasing  process.  Using  (7.3)  and  the  specific 
forms  of  local  description  provided  by  (7.4),  this  statement  translates  into  the 
following  result: 

Suppose  u*  is  optimal , let  g be  as  in  (7.5)  and  define 


h(t,z,o;)  = g(t,z,w)  - AA(t,w)  J*g(t,  J,u)n(t,dJy 


Then  for  almost  all  w 


J*  h (t,  z)  <j>  (t , z, u*)  n ( t, dz)  = min  f h (t,  z)  <p  (t, z ,u)  n (t,dz)  a.e.  (dA  ) 

utu  t 


(7.7) 


Thus,  as  in  (4.14),  the  optimal  control  minimizes  a "Hamiltonian."  A sufficient  con- 
dition for  optimality  similar  to  (4.15)  can  also  be  obtained.  In  the  litera- 
ture [12 1 , [22l » [77]  various  forms  of  Hamiltonian  appear,  depending  on  the 

nature  of  the  cost  function  and  the  function  <|>.  In  [77]  an  existence  theorem  along 
the  lines  of  (4.20)  is  obtained;  however  this  only  holds  under  very  restrictive  as- 
sumptions, related  to  the  absolute  continuity  of  the  measures.  In  the  Brownian  case 
all  the  measures  arc  mutually  absolutely  continuous  under  very  natural  conditions, 
and  this  is  crucial  in  the  proof  of  the  existence  result,  as  is  seen  in  (4  -18 ) r (4-19) 
In  the  jump  process  context  mutual  absolute  continuity  is  very  unnatural,  but  one  is 
apparently  obliged  to  insist  on  it  if  an  existence  result  is  to  be  obtained. 

Finally,  let  us  mention  some  other  work  related  to  the  above.  Optimality  condi- 
tions for  jump  processes  are  obtained  by  Kohlmann  [50]  using  Neustadt's  extremal 
theory  in  a fashion  analogous  to  Haussmann's  treatment  of  the  Brownian  case  [44]. 
Systems  with  both  Brownian  and  jump  process  disturbances  are  dealt  with  in  Boel  and 
Kohlmann  [9  ],  [10]  (based  on  a martingale  representation  theorem  of  Elliott  [33]) 
and  Lepeltier  and  Marchal  [58].  The  survey  [13]  by  Bremaud  and  Jacod  contains  an 
extensive  list  of  references  on  martingales  and  point  processes. 

7.2  Differential  games  [32],  [35],  [73],  [74],  [75],  [76] 

The  set-up  here  is  the  same  as  that  of  §4  except  that  we  suppose  U = U,xU_x...xU 

12  d 

where  each  U.  is  a compact  metric  space.  Then  U = £/,x...xi/,  where  U.  is  the  set  of 
1 1 N . 1 

Ft-predictable  IP -valued  processes,  and  we  assume  that  each  uxe£P  is  to  be  chosen  by 
a player  i with  the  objective  of  minimizing  a personal  cost 


1 n r1 

J. (u)  = J. (u  ...u  ) = E [/  c . ( s , x , u ) ds  + $.(x.) 
1 1 ul  1'  s 1 1 


(c^  and  satisfy  the  same  conditions  as  c,$  of  §4) . Thus  each  player  is  assumed 
to  have  perfect  observations  of  the  state  process  x 

• 1 * N* 

Various  solution  concepts  are  available  for  this  game  [76]:  u*  = (ux  ,...u  ) is 

- a Nash  equilibrium  if  there  is  no  i and  uXeU^  such  that 


Ji(u*J 


* ( i— 1 ) i *(i+l)  .N. ■ . 

,u*  ,u  ,u*  ,u*  ) < J\(u*) 


- efficient  if  there  is  no  uei/  such  that 

J'.  (u)  < J.  (u*)  for  all  i 
1 1 

- in  the  core  if  there  is  no  subset  S C { 1 , 2 . . . ,N}and  u eU  such  that 

J.  (v)  < J. (u*)  ies 
1 . 1 • ■ * 

where  v1  = u1  for  ies  and  v1  = u1  for  i£s. 

Thus  an  equilibrium  point  is  one  from  which  it  does  not  pay  any  player  to  deviate 
unilaterally,  a strategy  is  efficient  if  no  strategy  is  better  for  everybody  and  a 
strategy  is  in  the  core  if  no  coalition  can  act  jointly  to  improve  its  lot.  Evidently 
a core  strategy  is  both  efficient  and  an  equilibrium,  but  equilibrium  solutions  are 
not  necessarily  efficient  or  conversely. 


For  u ell  denote  J' (u)  = (J^ (u) , . . . , J^(u) ) and  let 
J = { J (u)  | \iBU  } 

N . • 

This  is  a bounded  subset  of  R , and  a sufficzent  condition  for  efficiency  of  a strategy 

N 

u*  is  the  existence  of  a non-negative  vector  X 6R  such  that 


(7.8)  X'J(u  *)<_X’C  for  all  £ 6J 

J,(u) 

(see  diagram  for  N=2) . If  J is  convex,  this  4 

J 

condition  is  also  necessary.  It  follows  from  X 

results  of  Benes  [2]  (see  the  remarks  follow-  \ f / \ 

ing  (4.20))  that  convexity  of  the  set  \ / 

(f  (t,  x,  U)  , c'  (t,x,U  )...,cN(t,x,U  ))CRn+N  

1 N * Sv J 

implies  convexity  of  J.  Now  (7.8)  says  that  A \ 

u*  is  optimal  for  the  control  problem  of  mini-  J(u*)  \ 

mizing  the  weighted  average  cost  (u)  = E (u) , * 

i 

Fix  u*6 U,  and  as  in  §4,,  let  g ,i=l,...,N,  be 
adapted  processes  such  that 

E A f cU  ds  + <t> . (x. ) |f  ] = J.!fu*)+  f g^o  dwU 
u*  y xs  i 1 t l A s s s 

0 0 

For  any  other  strategy  ue£/  the  right-hand  side  can  be  expressed,  as  in  (4.11),  as 


Vu) 


J1(U*)  + f g\j  dv.,u  + f (H1 
Ass  J s 


(u  ) - H (u*))ds 
s s s s 


where 


H1(u)  = g1f (t,x,u)  + c.(t,x,u) 
s 1 


Combining  the  remarks  above  with  (4.16)  shows  that  u*  is  efficient  if  there  exists 
N 

X6R  such  that 

(7.9)  E X*HX(u£)  <_  E X^H^(v),  a.e.  for  all  v€U 

i i 

under  the  convexity  hypothesis,  this  condition  is  also  necessary, 
u*  is  a Nash  equilibrium  if,  for  each  i,  u*1  minimizes 
(u*1, . . . ,u* ^ ^ ,u,u* , . . . ,u*N)  over  u €U^.  Applying  condition  (4.16)  we  sec 
that  this  will  be  the  case  if 

(7.10)  H1(u*)  £ H1(v),  a.e.  for  all  v6tb,  i=l,2,...,N 

Thus  u*  is  an  efficient  equilibrium  if  ufc  minimizes  each  "private"  Hamiltonian  as  in 

(7.10)  and  also  minimizes  a "social"  Hamiltonian  (7.9)  formed  as  a certain  weighted 
average  of  these.  Analogous  conditions  can  be  formulated  under  which  u*  lies  in  the 


For  (t,x,p^,u)  6 R+  * ft  x Rn x u define  the  Hamiltonians 
H‘l  (t,x,p  ,u)  = p 'f(t,x,u)  + c.  (t,x,u) 

We  say  that  the  IJaah  condition  holds  if  there  exists  for  i=l,...,N  measurable  functions 
u?(t,x,p  ,...,p  ) such  that  u°  is  a predictable  process  for  each  fixed  (p,u)=(p  ...  ,u) 

X X Ik  JL  Jl  N 


and 


H1(t,x,pi,u°(t,x,p) , . . . ,u°(t,x,p) ) < H1(t,x,pi,u°, . . . ,u?_lfv,u?+1, . . . ,u°) 


for  all  vei^  , for  each  (t,x,p)€R+  x ft  x RNn 


Uchida  shows  in  [73]  that  the  game  has 
g Nash  equilibrium  point  if  the  Nash  condition  holds.  The  proof  is  by  a contradic- 
tion argument  using  the  original  formulation  of  the  results  of  §4  as  given  in  Davis 
and  Varaiya  [25].  Conditions  under  which  the  Nash  condition  holds  are  stated  in  [74]. 

Now  consider  the  case  N=2,  J2 (u)  = -J  (u) , so  that  the  game  is  2-person,  O-sum- 
Then  the  core  concept  is  ugatory,  all  strategies  are  efficient  and  an  equilibrium  is 
a saddle  point,  i.e.  a strategy  u*  such  that  (denoting  = J)  for  all  u BU 

0 (u*1,  u2)  <_J(u*^,u*2)  ^JCu^.u*2) 

In  this  case  the  relevant  condition  is  the  Isaacs'  condition:  for  each  (t,x,p) 6R+  x Q x Rn , 
u*g&2  u^  H1(t,x,P,u1,u2)  = ^ um^2  H1  (t.x,?,^,^) 

The  main  result  is  analogous  to  the  above,  namely  that  a saddle  strategy  u*  exists 

if  the  Isaacs'  condition  holds.  The  argument,  given  by  Elliott  in  [32],  [35],  is 

constructive,  along  the  lines  leading  to  the  existence  result  (4.20)  for  the  control 

problem.  One  considers  first  the  situation  where  the  minimizing  player  I announces 

his  strategy  u e(/.  in  advance.  It  is  immediate  from  (4.20)  that  the  maximizing 

0 

player  II  has  an  optimal  reply  u^u^)  to  this.  Now  introduce  the  upper  value  function 


W*  = A 


r 1 0 

y c1(s,x,uj,,u2(u1))ds  + 4^  (xy  | Ffc] 


u S £/.  0 . 

1 1 u1,u2(u1)  t 

An  analysis  of  this  somewhat  similar  to  that  of  §4  shows  that  player  I has  a best 
strategy,  i.e.  a strategy  u^Sy^,  such  that 


0 0,0, 

Jl'u2(ul) 


J(u,.Uo(u,))  =um^)  J(u,,.u,(u,)) 


0 

‘1  • 2 '“1' 


If  it  is  player  II  who  announces  his  strategy  first,  then  we  can  define  in  an  analo- 
gous manner  the  lower  Value  function  W~.  in  general  W+  > W_,  but  if  the  Isaacs'  con- 
dition  holds  then  = Wfc  and  it  follows  that  u*  given  by  u*1  = u®,  u*  = u2(u®)  is 
a saddle  strategy. 

A somewhat  more  restricted  version  of  this  result  was  given  by  Varaiya  in  [75] , 
using  a compactness-of-densities  argument  similar  to  that  of  Benes  [1]  and  Duncan  and 
Varaiya  for  the  control  problem.  No  results  are  available  if  the  players  do  not  have 
complete  observations.  Some  analogous  results  for  a differential  game  including  a 
jump  process  component  are  given  in  [49] . 

7. 3 Optimal  stopping  and  impulse  control 


In  the  conventional  formulation  of  optimal  stopping  one  is  given  a Markov  process 
xfc  on  a state  space  S and  a bounded  continuous  function  on  S,  and  asked  to  find  a 
Markov  time  T such  that  Ex$(xt)  > E^  $(x  ) for  all  x6S  and  Markov  times  o . Let 


Then  under  some  regularity  conditions  ip  is  the  "least  excessive  majorant"  of  4>  (i.e.. 


iJj(x)  >_  <p  (x)  and  4t(xt)  is  a supcrmartingale)  and  the  first  entrance  time  of  xfc  into  the 

set  {x:  <p  (x)  = ifi(x)}is  an  optimal  time.  See  [4  ],  and  the  references  there.  If  we 

define  = d>  (x  ) and  W = \b  (x  ) then  x maximizes  E X and  x = inf  {t:  X = Z } Thus  th' 
t t t t x t t * 

optimal  stopping  problem  generalizes  naturally  as  follows. 

Let  (fi,F,P)  be  a probability  space  and  (F^)^^  be  an  increasing,  right-continuous, 
completed  family  of  sub-a-fields  of  F.  Let  T denote  the  set  of  F^-stopping  times  and 
Xfc  be  a given  positive,  bounded  optional  process  defined  on  [0,“>] . The  optimal  stopping 
problem  is  then  to  find  Ter  such  that 

EXt  = m EXS 

This  problem  is  studied  by  Bismut  and  Skalli  in  [8  ] . The  simplest  case  occurs  when 
X^  satisfies  the  following  hypothesis: 

(7.11)  Let  {t  t}  be  stopping  times  such  that  t It  or  T l-T.  Then  ex  -*■  e* 

n'  trr  z n n T T 

n 

Criteria  under  which  (7.11)  holds  are  given  in  [8]. 

An  essential  role  in  this  problem  is  played  by  the  Snell  envelope  of  X intro- 
duced by  Mertens  [62,  Theorem  4] . He  shows  that  the  set  of  all  supermartingales 
which  majorize  has  a smallest  member,  denoted  Wfc,  which  is  characterized  by  the 
property  that  for  any  stopping  time  T and  a-field  GtIFt, 

E[Wt|C]  = egs^sjjp  E [Xg | G] 

Thus  in  particular  for  each  fixed  time  t 

Wt  = eIs>s^P  E[XslFt] 

so  that  is  the  value  function  for  the  optimal  stopping  problem.  Under  condition 

(7.11)  is  regular  [63,  VII  D33]  and  hence  has  the  Meyer  decomposition 


wt  = Mt  - Bt 


where  Mfc  is  a martingale  and  Bfc  a continuous  increasing  process  with  BQ=0.  Now  define 


Dq  = inf {t>0:  Bt>ol 


A = { (t, oj)  : Xt(w)  = Wt(OJ)  } 


The  debut  of  A is  the  stopping  time  = inf{t:  (t,a))6A}.  It  is  shown  in  [8]  that 
Dq  < Dq  and  that: 

(7.12)  A stopping  time  T is  optimal  if  and  only  if  the  graph  of  T is  contained  in 
A and  T < 

In  particular,  both  DQ  and  are  optimal. 

This  result  implies  an  optimality  criterion  similar  to  (5.3  ):  if  T is  optimal 

then  B = 0 so  that  VJ  = M is  a martingale,  and  conversely  if  W,  is  a mar- 
tAT  tAT  tAT  tAT 

tingale  then  it  is  easily  seen  that  T must  satisfy  the  conditions  of  (7.12). 


Analogous  results  can  be  obtained  for  processes  more  general  than  those  satis- 
fying (7.11);  the  details  are  more  involved  and  only  e-optimal  stopping  times  may 


exist. 


Impulse  control:  Space  precludes  any  detailed  discussion  of  this  topic,  but  it 
should  be  mentioned  that  a martingale  treatment  has  been  given  by  Lepeltier  and  Mar- 
chal  (59).  In  the  simplest  type  of  problem  one  has  a stochastic  differential  equation 


dxfc  = f(xt)dt  + a(xfc)dwt 


A strategy  6=  {T^.Y^}  consists  of  an  increasing  sequence  of  stopping  times  and 


a sequence  of  random  variables  Y such  that  Y is  F -measurable.  The  corresponding 


trajectory  is  x^  defined  by 


xQ  = x (given) 


dx^  = f(x^)  + CT(x^)dwfc 


6 6 A v 
xm  = xm  + Y 
T T-  n 

n n 


The  strategy  <5  is  to  be  chosen  to  minimize 


J(S>  - Et  I I(T 


c(x  )ds] 
s 


A value  function  and  conditions  for  optimality  can  be  obtained  along  the  lines  of 
§5.  It  is  worth  pointing  out  that  the  above  system  obviously  has  a Markovian  flavor 


about  it,  and  indeed  it  is  shown  in  [59]  that  the  value  function  is  Markovian  (i.e.. 


at  time  t it  depends  on  x°  only  through  x°)  even  though  the  controls  5 are  merely 
assumed  to  be  non-anticipative.  Some  further  remarks  on  this  are  given  in  the  next 


section. 


7.4  Markovian  systems 


Let  us  return  to  the  problem  of  §4  and  suppose  that  the  system  equation  and  cost 


dxfc  = f(t,xt,ut)dt  + o(t,xfc)dwt 


J(u)  = Eu( j'  c(t,xt,ufc)dt  + $(x1)], 


i.e.,  we  have  a diffusion  model  as  considered  in  §2.  In  §4  the  admissible  controls 


U were  general  non-anticipative  functionals  but  here  it  seems  clear  that  feedback 


controls  of  the  form  u(t,xfc)  should  be  adequate.  Denote  by  M the  set  of  measurable 
functions  u:  [0,1]  x Rn  -►  U;  then  bizlJ  if  we  identify  u€ M with  the  process  u^_  = u(t,xt), 


and  xfc  is  a Markov  process  under  measure  P^.  Thus  we  can  define  the  Harkov lan  value 


function  WM(t,x)  as  (with  obvious  notation) 

wM(t'x)  = uev  <.4  c(S'=<s^s)ds  + *<*!>] 


M r 

conjecture  then  is  that  W a*e.  (W^  being  defined  as  in  34)  so  that  in 


particular 


jg£  J(u)  "iej  J(u) 

M 

This  is  easily  established  (see  (25,  §6])  if  it  can  be  shown  that  w satisfies  a prin- 
ciple of  optimality  similar  to  (4.7  ).  However  this  is  not  clear,  as  there  is  still, 
to  my  knowldge,  no  direct  proof  that  the  class  M satisfies  the  e-lattic  property.  An 
argument  along  the  lines  given  in  § 5 fails  because  it  involves  "mixing"  two  controls 
u ,u ^£14  to  form  a control  v by  taking 


v = 
s 


(U1{S'XS)1P 

ju2(s,xs)I 


where  s>t  and  A6F  . But  then  v is  of  course  no  longer  Markov.  Thus  the  results 
— t s 

presented  in  §6  of  [23]  must  be  regarded  as  incomplete. 

This  problem  has  been  dealt  with  in  the  case  of  controlled  Markov  jump  processes 
by  Davis  and  Wan  [26].  There  it  is  possible  to  "mix"  two  controls  in  a more  ingenious 
way  which,  however,  uses  the  special  structure  of  the  sample  paths  very  explicitly 
and  hence  does  not  generalize  to  other  problems.  An  alternative  approach  would  be  to 
start  with  the  value  process  W as  previously  defined  and  to  show  directly  that 
= W(t,xt)  for  some  function  W.  This  has  been  done  by  Lepeltier  and  Marchal  [59] 
for  impulse  control  problems  but  again  the  argument  is  very  problem-specific. 

My  general  conclusion  from  the  above  is  that  the  direct  Martingale  approach  is 
not  particularly  well  adapted  to  Markovian  problems,  and  that  more  information  can  be 
obtained  from  methods  such  as  those  of  Bismut  [5]  which  are  specially  tailored  for 
Markov  processes. 


8.  CONCLUDING  REMARKS 

The  successes  of  martingale  methods  in  control  are  twofold:  firstly  the  essence 
of  the  optimality  principle  is  revealed  in  the  general  formulation  (5.3  ),  and  in 
particular  the  fundamental  difference  between  the  situations  of  complete  and  of  in- 
complete observations  is  clearly  brought  out;  and  secondly,  the  power  of  the  sub- 
martingale decomposition  provides,  in  effect,  a weak  form  of  differentiation  which 
enables  minimum  principles  and  existence  of  optimal  controls  to  be  established  with 
few  technical  restrictions.  The  drawbacks  of  the  method  are  that  it  does  not  lead 
naturally  to  computational  techniques,  and  there  are  difficulties  in  handling  Marko- 
vian systems  and  problem  formulations  of  the  "separation  principle"  type. 

Here  are  a few  suggestions  for  further  research. 

(8.1)  Obtain  a more  explicit  characterization  of  the  "adjoint  process"  gfc  of 
§4.  Comparisons  with  deterministic  optimal  control  theory  and  other  forms  of  stochastic 
minimum  principle  [6],  [53]  suggest  that  it  should  satisfy  some  form  of  "adjoint  equa- 
tion," yet  little  is  known  about  this  unless  the  optimal  control  is  smooth  [44]. 

(8.2)  To  my  knowledge  martingale  methods  have  not  been  applied  seriously  to 


infinite-time  problems  (see  Kushner  [55]  for  some  results  using  methods  similar  to 
those  of  Bismut  [5]). 

(8.3)  The  partially-observable  problem  continues  to  elude  a satisfactory 

treatment.  In  particular  there  are  no  good  existence  theorems,  and  experience  with 
the  separation  theorem  (§6)  suggests  that  these  may  be  hard  to  get.  My  feeling  is 
that  the  proper  formulation  of  partially-observable  problems  must  explicitly  include 
filtering,  since  it  is  the  conditional  distribution  of  the  state  given  the  observa- 
tions that  is  the  true  "state"  of  the  system.  A lot  of  information  about  nonlinear 
filtering  is  available  (60)  but,  again  using  the  separation  principle  as  a cautionary 
tale,  it  is  far  from  clear  how  to  incorporate  this  into  the  martingale  framework. 
Possibly  some  entirely  different  approach,  such  as  Nisio's  nonlinear  semigroup  for- 
mulation, will  turn  out  to  be  more  appropriate.  See  [20]  for  a step  in  this  direction. 


Is  mutual  absolute  continuity  of  the  measures  P^  really  necessary  for 


(8.4)  Show  that  the  e-lattice  property  holds  in  some  generality  for  Markovian 
systems  with  Markov  controls  (cf.  §7.4). 

(8.5)  Give  a constructive  treatment  of  Uchida's  result  [73]  on  the  existence 
of  Nash  equilibirum  points  in  stochastic  differential  games. 

(8.6) 

the  existence  result  (4.20)?  If  not  then  better  existence  results  could  possibly  be 
obtained  for  problems  such  as  controlled  jump  processes  (§7.1)  where  mutual  absolute 
continuity  does  not  arise  so  naturally. 
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